On the Analysis of the Illumina 450k Array Data: Probes Ambiguously Mapped to the Human Genome

نویسندگان

  • Xu Zhang
  • Wenbo Mu
  • Wei Zhang
چکیده

The newly developed Illumina HumanMethylation450 BeadChip (450K array; Illumina, Inc., San Diego, CA, USA) allows unprecedented genome-wide profiling of DNA methylation at >450,000 CpG and non-CpG methylation sites (Sandoval et al., 2011). Utilizing the 450K array, Philibert et al. (2012) examined the relationship of recent alcohol intake to genome-wide methylation patterns in lymphoblast DNA samples derived from 165 female subjects participating in the Iowa Adoption Studies. The authors’ interesting paper demonstrated that the 450K array could be a useful tool for ongoing and newly designed epigenome projects. However, given the unique design of the platform (for detailed annotations for the 450K array including probe sequences: http://www.illumina.com/), some cautions might need to be exerted when analyzing the 450K array data, in addition to some general challenges for analyzing the whole-genome DNA methylation data (Laird, 2010). Particularly, we found that a substantial proportion of the >450,000 DNA methylation probes on the 450K array are not aligned to unique, unambiguous loci in the human genome (Moen et al., 2012). In total, we found ∼140,000 methylation probes ambiguously mapped to multiple locations in the human genome (hg19) with up to two mismatches in the probe sequences using Bowtie (v2.0.0 beta2; Langmead et al., 2009; Langmead and Salzberg, 2012). Briefly, Bowtie is an ultrafast, memory-efficient short read aligner by indexing the genome with an extended Burrows–Wheeler technique, which implements a novel quality-aware backtracking algorithm that permits mismatches (Langmead et al., 2009; Langmead and Salzberg, 2012). Different alignment algorithms, e.g., BLAT (Kent, 2002) and MAQ (Li et al., 2008), would provide similar estimates (unpublished data). In comparison, ∼1,000 methylation probes were found to be ambiguously mapped to the human genome hg18 in the earlier 27K Illumina Human Methylation array (27K array; Bell et al., 2011). Because the much more comprehensive 450K array covers not only promoters, but also gene bodies, untranslated regions (UTRs) and “open sea” methylation sites, the problem of ambiguous alignment may particularly need to be taken into account when analyzing the data from this new platform. Notably, 20 CpG methylation probes (e.g., cg24023553 in Table 2; cg00004209 in Table 3; cg24675557 in Table 5) out of the 90 top-ranking probes reported by Philibert et al. (2012) were mapped to ambiguous loci in the current human reference (hg19) using Bowtie (Langmead et al., 2009; Langmead and Salzberg, 2012). Since the problem of ambiguous alignment to the human genome may cause unreliable measurement of DNA methylation level at a particular methylation site, considering this unique problem for this platform may not only facilitate the data analysis (e.g., by improving the multipletesting problem by removing those affected probes), but also help interpret the results by focusing on more reliable biological signals. In addition, other factors (e.g., polymorphisms in the target sequences, potential batch effects) that may affect other platforms (e.g., the 27K array; Bell et al., 2011; Fraser et al., 2012) as well may also need to be considered in the analysis of these data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Infinium Monkeys: Infinium 450K Array for the Cynomolgus macaque (Macaca fascicularis)

The Infinium Human Methylation450 BeadChip Array (Infinium 450K) is a robust and cost-efficient survey of genome-wide DNA methylation patterns. Macaca fascicularis (Cynomolgus macaque) is an important disease model; however, its genome sequence is only recently published, and few tools exist to interrogate the molecular state of Cynomolgus macaque tissues. Although the Infinium 450K is a hybrid...

متن کامل

MethylAid: visual and interactive quality control of large Illumina 450k datasets

UNLABELLED The Illumina 450k array is a frequently used platform for large-scale genome-wide DNA methylation studies, i.e. epigenome-wide association studies. Currently, quality control of 450k data can be performed with Illumina's GenomeStudio and is part of a limited number 450k analysis pipelines. However, GenomeStudio cannot handle large-scale studies, and existing pipelines provide limited...

متن کامل

A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data

DNA methylation plays an important role in disease etiology. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a widely used platform in large-scale epidemiologic studies. This platform can efficiently and simultaneously measure methylation levels at ∼480,000 CpG sites in the human genome in multiple study samples. Due to the intrinsic chip design of 2 types of chemistry probes, data...

متن کامل

Paternal sperm DNA methylation associated with early signs of autism risk in an autism-enriched cohort.

BACKGROUND Epigenetic mechanisms such as altered DNA methylation have been suggested to play a role in autism, beginning with the classical association of Prader-Willi syndrome, an imprinting disorder, with autistic features. OBJECTIVES Here we tested for the relationship of paternal sperm DNA methylation with autism risk in offspring, examining an enriched-risk cohort of fathers of autistic ...

متن کامل

Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data

Due to their relatively low-cost per sample and broad, gene-centric coverage of CpGs across the human genome, Illumina's 450k arrays are widely used in large scale differential methylation studies. However, by their very nature, large studies are particularly susceptible to the effects of unwanted variation. The effects of unwanted variation have been extensively documented in gene expression a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2012